AITopics | chatbot response

Collaborating Authors

chatbot response

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

You probably wouldn't notice if an AI chatbot slipped ads into its responses

AIHubMay-27-2026, 10:03:56 GMT

You probably wouldn't notice if an AI chatbot slipped ads into its responses Hundreds of millions of people consult artificial intelligence chatbots on a daily basis for everything from product recommendations to romance, making them a tempting audience to target with potentially below-the-radar advertising. Indeed, our research suggests AI chatbots could easily be used for covert advertising to manipulate their human users. We are computer scientists who have been tracking AI safety and privacy for several years. In a study we published in an Association for Computing Machinery journal, we found that chatbots trained to embed personalized product ads in replies to queries influenced people's choices about products. And most participants didn't recognize that they were being manipulated.

artificial intelligence, machine learning, natural language, (18 more...)

AIHub

Country: North America > United States (0.17)

Genre: Research Report > New Finding (0.35)

Industry:

Information Technology > Security & Privacy (1.00)
Information Technology > Services (0.70)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.31)

Add feedback

Through the Judge's Eyes: Inferred Thinking Traces Improve Reliability of LLM Raters

Zhang, Xingjian, Gao, Tianhong, Jin, Suliang, Wang, Tianhao, Ye, Teng, Adar, Eytan, Mei, Qiaozhu

arXiv.org Artificial IntelligenceOct-31-2025

Large language models (LLMs) are increasingly used as raters for evaluation tasks. However, their reliability is often limited for subjective tasks, when human judgments involve subtle reasoning beyond annotation labels. Thinking traces, the reasoning behind a judgment, are highly informative but challenging to collect and curate. We present a human-LLM collaborative framework to infer thinking traces from label-only annotations. The proposed framework uses a simple and effective rejection sampling method to reconstruct these traces at scale. These inferred thinking traces are applied to two complementary tasks: (1) fine-tuning open LLM raters; and (2) synthesizing clearer annotation guidelines for proprietary LLM raters. Across multiple datasets, our methods lead to significantly improved LLM-human agreement. Additionally, the refined annotation guidelines increase agreement among different LLM models. These results suggest that LLMs can serve as practical proxies for otherwise unrevealed human thinking traces, enabling label-only corpora to be extended into thinking-trace-augmented resources that enhance the reliability of LLM raters.

large language model, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

2510.2586

Country:

North America > United States > Minnesota (0.28)
North America > United States > Michigan > Washtenaw County > Ann Arbor (0.14)

Genre: Research Report > New Finding (0.66)

Industry: Education (0.95)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)

Add feedback

MedScore: Generalizable Factuality Evaluation of Free-Form Medical Answers by Domain-adapted Claim Decomposition and Verification

Huang, Heyuan, DeLucia, Alexandra, Tiyyala, Vijay Murari, Dredze, Mark

arXiv.org Artificial IntelligenceOct-21-2025

While Large Language Models (LLMs) can generate fluent and convincing responses, they are not necessarily correct. This is especially apparent in the popular decompose-then-verify factuality evaluation pipeline, where LLMs evaluate generations by decomposing the generations into individual, valid claims. Factuality evaluation is especially important for medical answers, since incorrect medical information could seriously harm the patient. However, existing factuality systems are a poor match for the medical domain, as they are typically only evaluated on objective, entity-centric, formulaic texts such as biographies and historical topics. This differs from condition-dependent, conversational, hypothetical, sentence-structure diverse, and subjective medical answers, which makes decomposition into valid facts challenging. We propose MedScore, a new pipeline to decompose medical answers into condition-aware valid facts and verify against in-domain corpora. Our method extracts up to three times more valid facts than existing methods, reducing hallucination and vague references, and retaining condition-dependency in facts. The resulting factuality score substantially varies by decomposition method, verification corpus, and used backbone LLM, highlighting the importance of customizing each step for reliable factuality evaluation by using our generalizable and modularized pipeline for domain adaptation.

information, large language model, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2505.18452

Country:

North America > United States (1.00)
Asia (1.00)
Europe (0.92)

Genre: Research Report (0.82)

Industry:

Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (1.00)
Health & Medicine > Therapeutic Area > Immunology (1.00)
Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
(4 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)

Add feedback

Large language models provide unsafe answers to patient-posed medical questions

Draelos, Rachel L., Afreen, Samina, Blasko, Barbara, Brazile, Tiffany L., Chase, Natasha, Desai, Dimple Patel, Evert, Jessica, Gardner, Heather L., Herrmann, Lauren, House, Aswathy Vaikom, Kass, Stephanie, Kavan, Marianne, Khemani, Kirshma, Koire, Amanda, McDonald, Lauren M., Rabeeah, Zahraa, Shah, Amy

arXiv.org Artificial IntelligenceAug-6-2025

Millions of patients are already using large language model (LLM) chatbots for medical advice on a regular basis, raising patient safety concerns. This physician-led red-teaming study compares the safety of four publicly available chatbots--Claude by Anthropic, Gemini by Google, GPT-4o by OpenAI, and Llama3-70B by Meta--on a new dataset, HealthAdvice, using an evaluation framework that enables quantitative and qualitative analysis. In total, 888 chatbot responses are evaluated for 222 patient-posed advice-seeking medical questions on primary care topics spanning internal medicine, women's health, and pediatrics. We find statistically significant differences between chatbots. The rate of problematic responses varies from 21.6 percent (Claude) to 43.2 percent (Llama), with unsafe responses varying from 5 percent (Claude) to 13 percent (GPT-4o, Llama). Qualitative results reveal chatbot responses with the potential to lead to serious patient harm. This study suggests that millions of patients could be receiving unsafe medical advice from publicly available chatbots, and further work is needed to improve the clinical safety of these powerful tools.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2507.18905

Country: North America > United States > California > San Francisco County > San Francisco (0.28)

Genre: Research Report > Experimental Study (1.00)

Industry:

Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (1.00)
Health & Medicine > Therapeutic Area > Immunology (1.00)
Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (1.00)
(5 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Building Trust in Mental Health Chatbots: Safety Metrics and LLM-Based Evaluation Tools

Park, Jung In, Abbasian, Mahyar, Azimi, Iman, Bounds, Dawn, Jun, Angela, Han, Jaesu, McCarron, Robert, Borelli, Jessica, Li, Jia, Mahmoudi, Mona, Wiedenhoeft, Carmen, Rahmani, Amir

arXiv.org Artificial IntelligenceAug-3-2024

Key Words: Mental health chatbots, large language models, clinical safety, evaluation metrics, automated assessment Word Count: 3,686 ABSTRACT Objective: This study aims to develop and validate an evaluation framework to ensure the safety and reliability of mental health chatbots, which are increasingly popular due to their accessibility, human-like interactions, and context-aware support. Materials and Methods: We created an evaluation framework with 100 benchmark questions and ideal responses, and five guideline questions for chatbot responses. This framework, validated by mental health experts, was tested on a GPT-3.5-turbo-based Automated evaluation methods explored included large language model (LLM)-based scoring, an agentic approach using real-time data, and embedding models to compare chatbot responses against ground truth standards. The agentic method, dynamically accessing reliable information, demonstrated the best alignment with human assessments. Discussion: Our findings emphasize the need for comprehensive, expert-tailored safety evaluation metrics for mental health chatbots. While LLMs have significant potential, careful implementation is necessary to mitigate risks. The superior performance of the agentic approach underscores the importance of real-time data access in enhancing chatbot reliability. Future work should extend evaluations to accuracy, bias, empathy, and privacy to ensure holistic assessment and responsible integration into healthcare. Standardized evaluations will build trust among users and professionals, facilitating broader adoption and improved mental health support through technology.

chatbot, mental health chatbot, safety, (15 more...)

arXiv.org Artificial Intelligence

2408.0465

Country:

North America > United States > California > Orange County > Irvine (0.15)
North America > United States > California > Santa Clara County > Palo Alto (0.04)

Genre:

Research Report > Experimental Study (0.93)
Research Report > New Finding (0.66)

Industry: Health & Medicine > Therapeutic Area > Psychiatry/Psychology > Mental Health (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.50)

Add feedback

The Challenges of Evaluating LLM Applications: An Analysis of Automated, Human, and LLM-Based Approaches

Abeysinghe, Bhashithe, Circi, Ruhan

arXiv.org Artificial IntelligenceJun-13-2024

Chatbots have been an interesting application of natural language generation since its inception. With novel transformer based Generative AI methods, building chatbots have become trivial. Chatbots which are targeted at specific domains for example medicine and psychology are implemented rapidly. This however, should not distract from the need to evaluate the chatbot responses. Especially because the natural language generation community does not entirely agree upon how to effectively evaluate such applications. With this work we discuss the issue further with the increasingly popular LLM based evaluations and how they correlate with human evaluations. Additionally, we introduce a comprehensive factored evaluation mechanism that can be utilized in conjunction with both human and LLM-based evaluations. We present the results of an experimental evaluation conducted using this scheme in one of our chatbot implementations which consumed educational reports, and subsequently compare automated, traditional human evaluation, factored human evaluation, and factored LLM evaluation. Results show that factor based evaluation produces better insights on which aspects need to be improved in LLM applications and further strengthens the argument to use human evaluation in critical spaces where main functionality is not direct retrieval.

chatbot, evaluation, human evaluation, (15 more...)

arXiv.org Artificial Intelligence

2406.03339

Country:

North America > United States > Virginia > Arlington County > Arlington (0.04)
North America > United States > Pennsylvania > Philadelphia County > Philadelphia (0.04)
North America > United States > Michigan > Washtenaw County > Ann Arbor (0.04)
(5 more...)

Genre: Research Report > New Finding (0.34)

Industry: Health & Medicine > Therapeutic Area (0.48)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Evaluator for Emotionally Consistent Chatbots

Liu, Chenxiao, Deng, Guanzhi, Ji, Tao, Tang, Difei, Zheng, Silai

arXiv.org Artificial IntelligenceDec-2-2021

One challenge for evaluating current In this research, we aim to train an evaluator sequence-or dialogue-level chatbots, that can effectively evaluate the emotional such as Empathetic Open-domain consistency of chatbots. Conversation Models, is to determine whether the chatbot performs in an 1.2 Related Work emotionally consistent way. The most recent work only evaluates on the Empathetic dialogues There are studies aspects of context coherence, language (Rashkin et al., 2019; Li et al., 2017; Zhou fluency, response diversity, or logical et al., 2018; Sheen, 2021) that provide self-consistency between dialogues.

chatbot, dataset, utterance, (14 more...)

arXiv.org Artificial Intelligence

2112.01616

Country: North America > United States > California (0.14)

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)

Add feedback

Build a simple Chatbot using NLTK Library in Python - Analytics Vidhya

#artificialintelligenceJul-1-2021, 06:35:40 GMT

How amazing it is to talk to someone by asking and telling anything and Not being judged at all, That's the beauty of a chatbot. A chatbot is an AI-based software that comes under the application of NLP which deals with users to handle their specific queries without Human interference. A chatbot is a smart application that reduces human work and helps an organization to solve basic queries of the customer. Today most of the companies, business from different sector makes use of chatbot in a different way to reply their customer as fast as possible. Chatbot asks for basic information of customers like name, email address, and the query.

application, chatbot, customer, (11 more...)

#artificialintelligence

Technology: Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)

Add feedback

Chatbots in a nutshell - The Digital Transformation People

#artificialintelligenceMay-22-2020, 07:32:51 GMT

Marketing scientist Kevin Gray asks Dr. Anna Farzindar of the University of Southern California about chatbots and the ways they are used. Is there a formal definition you prefer? Conversational or dialog agents are designed to communicate with us in human language. These software agents are deployed everywhere around us; when talking to your car, communicating with robots, or using your personal assistant on any device or smartphone, such as Alexa, Cortona, SIRI or Google Assistant. The term "chatbot" is often used in industry for conversational agents that can be integrated through any online messaging application.

artificial intelligence, chatbot, natural language, (17 more...)

#artificialintelligence

Country: North America > United States > California (0.55)

Industry:

Health & Medicine > Therapeutic Area (0.76)
Information Technology > Security & Privacy (0.75)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)

Add feedback

How to Build Basic Chatbot Without Coding and Deploy to Websites

#artificialintelligenceFeb-17-2020, 12:26:45 GMT

A chat-bot is, a robotic self learning and talking bot which imitate human conversation through text chats and voice commands (a good example being Siri or Amazon Alexa). Task Handling Chat-bot where you ask something and it execute that task in more easy manner. For example if you ask to book a table at a restaurant, or open website than it will perform the operation on your mobile, laptop and lands you at the page you ask, order the pizza for you A.I. based chat bots (learn over a period of time using Machine Learning techniques) -- dialog flow is an example of that Chat bots are mostly used for businesses will only increase as time goes by. No programming prior experience is required because Google Dialogflow is the platform where all the Machine learning algorithm get trained in back-end. Go to the Dialogflow Console.

appointment, chatbot, chatbot response, (11 more...)

#artificialintelligence

Technology:

Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback